Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove deprecations and CategoricalArrays.jl dependency #2554

Merged
merged 10 commits into from
Nov 27, 2020

Conversation

bkamins
Copy link
Member

@bkamins bkamins commented Nov 20, 2020

No description provided.

@bkamins
Copy link
Member Author

bkamins commented Nov 20, 2020

It seems that removing CategoricalArrays.jl dependency does not have much impact on compilation times now:

Master

julia> @time using DataFrames;
  1.388877 seconds (2.29 M allocations: 147.759 MiB, 7.78% gc time)

julia> @time df = DataFrame(:a => [1,2], :b => [3,4]);
  0.192164 seconds (190.26 k allocations: 10.219 MiB)

julia> @time gdf = groupby(df, :a);
  0.504278 seconds (840.73 k allocations: 42.139 MiB, 3.94% gc time)

julia> @time combine(gdf, nrow);
  0.943988 seconds (1.80 M allocations: 92.772 MiB, 2.72% gc time)

julia> @time combine(gdf, :b => sum);
  0.185167 seconds (642.57 k allocations: 33.666 MiB, 4.39% gc time)

julia> @time transform(gdf);
  0.751447 seconds (2.82 M allocations: 148.360 MiB, 3.53% gc time)

This PR:

julia> @time using DataFrames;
  0.829970 seconds (916.92 k allocations: 63.014 MiB)

julia> @time df = DataFrame(:a => [1,2], :b => [3,4]);
  0.211642 seconds (157.83 k allocations: 8.434 MiB)

julia> @time gdf = groupby(df, :a);
  0.449700 seconds (459.70 k allocations: 23.189 MiB, 14.18% gc time)

julia> @time combine(gdf, nrow);
  0.944649 seconds (1.79 M allocations: 92.054 MiB, 2.79% gc time)

julia> @time combine(gdf, :b => sum);
  0.186040 seconds (642.18 k allocations: 33.630 MiB, 4.17% gc time)

julia> @time transform(gdf);
  0.781098 seconds (2.77 M allocations: 145.770 MiB, 3.76% gc time)

except for package loading time (which was expected to decrease).

@timholy - where would you expect the biggest gains. Thank you!

CC @nalimilan

@timholy
Copy link
Contributor

timholy commented Nov 20, 2020

First, CategoricalArrays no longer invalidates much stuff, so there may be no urgency in getting rid of it. Version 0.9 only invalidates 212 methods, and most of these are probably inconsequential.

Second, when you do invalidate stuff, it's often the things that build on you, not you yourself, that are the big winners. For example, I bet that if Gadfly added a precompile file (using the tricks in SnoopCompile) they'd now have vastly more success.
Basically, what happens is that if loading PkgA invalidates method1, but PkgA doesn't need/use method1, it has literally no impact on PkgA. However, if PkgB uses both PkgA and method1, then PkgB might get hammered by depending on PkgA. If PkgA stops invalidating method1, PkgB wins even if it doesn't help PkgA at all.

@bkamins
Copy link
Member Author

bkamins commented Nov 20, 2020

Thank you for the explanation.

so there may be no urgency in getting rid of it.

As you have commented on Slack - we have already removed the dependency in 0.22 release. Here we only remove the actual using statement essentially as we needed one round of releases before doing such a change due to our deprecation policy.

Still - even if CategoricalArrays.jl is much more compiler friendly now (which I agree it is) it is better anyway to have no dependence between the packages and make them interoperate via DataAPI.jl interface.

src/deprecated.jl Show resolved Hide resolved
NEWS.md Outdated Show resolved Hide resolved
src/dataframe/dataframe.jl Outdated Show resolved Hide resolved
src/abstractdataframe/abstractdataframe.jl Outdated Show resolved Hide resolved
NEWS.md Outdated Show resolved Hide resolved
NEWS.md Outdated Show resolved Hide resolved
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
@bkamins
Copy link
Member Author

bkamins commented Nov 26, 2020

@nalimilan - this should be good for a review

src/deprecated.jl Show resolved Hide resolved
NEWS.md Outdated Show resolved Hide resolved
@bkamins
Copy link
Member Author

bkamins commented Nov 26, 2020

updated after the review

@bkamins
Copy link
Member Author

bkamins commented Nov 26, 2020

@nalimilan - I have just checked that we have a significant regression on Julia nightly (the timings are the for the same operations as ones above):

julia> @time using DataFrames
  0.288947 seconds (747.34 k allocations: 52.368 MiB, 14.73% compilation time)

julia> @time df = DataFrame(:a => [1,2], :b => [3,4]);
  0.220673 seconds (266.01 k allocations: 16.396 MiB, 5.58% gc time, 99.95% compilation time)

julia> @time gdf = groupby(df, :a);
  1.435309 seconds (6.04 M allocations: 373.123 MiB, 5.68% gc time, 99.99% compilation time)

julia> @time combine(gdf, nrow);
  3.053012 seconds (9.99 M allocations: 610.595 MiB, 7.72% gc time, 99.96% compilation time)

julia> @time combine(gdf, :b => sum);
  0.305407 seconds (542.59 k allocations: 34.092 MiB, 3.56% gc time, 99.79% compilation time)

julia> @time transform(gdf);
  0.173324 seconds (241.60 k allocations: 15.545 MiB, 5.98% gc time, 99.43% compilation time)

@nalimilan
Copy link
Member

Wow, with 99% spent in compilation time, that's a serious regression in Julia. Can you file a bug there?

@bkamins bkamins merged commit 2d0413a into JuliaData:master Nov 27, 2020
@bkamins bkamins deleted the remove_deprecations branch November 27, 2020 09:33
@bkamins
Copy link
Member Author

bkamins commented Nov 27, 2020

Thank you!

@simeonschaub
Copy link

simeonschaub commented Mar 6, 2021

Is there a reason CategoricalArrays is still listed as a dependency in the Project.toml file? IIUC, it should now only be a test dependency.

@bkamins
Copy link
Member Author

bkamins commented Mar 6, 2021

This must have been a mistake when rebasing PR. It should be only test dependency. I will fix it.

@bkamins
Copy link
Member Author

bkamins commented Mar 6, 2021

see #2646 (as I am not 100% certain about Project.toml semantics)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants